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Abstract. First Step to Success (First Step; Walker et al., 1997, 1998) is a 
secondary-level intervention for students with behavior problems in early ele¬ 
mentary school. The purposes of this study were to assess whether effects in 
student behavior and academics at posttest shown in a recent efficacy trial 
(Walker et al., 2009) were maintained at follow-up and to examine the relation¬ 
ship of implementation fidelity to outcomes. The findings showed that although 
First Step’s initial impact was significant and positive across all behavior and 
some academic measures, gains eroded 1 year after the intervention was with¬ 
drawn. Results are discussed in the context of students’ experience of yearly 
change in classroom environments, teachers’ variable behavioral expectations and 
perceptions, and the need for intervention maintenance plans to support sustain¬ 
ment of treatment effects. 


As school districts face serious budget 
cuts with concomitant pressure to improve test 
scores, it is increasingly important that their 
limited resources be used to support evidence- 
based programs and interventions that address 
widespread problems with high social and 


economic costs such as antisocial behavior 
exhibited by young children. The rates of ag¬ 
gressive and antisocial behavior among chil¬ 
dren have increased over the past 50 years and 
constitute “a major public health problem for 
society” (Connor, 2004, p. 28). Researchers 


The contents of this research report were developed under a $5.9 million grant from the Department of 
Education. However, those contents do not necessarily represent the policy of the Department of 
Education, and one should not assume endorsement by the Federal Government. 

Correspondence concerning this article should be addressed to Michelle W. Woodbridge, SRI Interna¬ 
tional, 333 Ravenswood Ave, BS 124, Menlo Park, CA 94025; e-mail: michelle.woodbridge@sri.com 


299 





School Psychology Review, 2014, Volume 43, No. 3 


argue that if behavior problems are not cor¬ 
rected early, they can lead to serious behavior 
disorders that can persist over the life course 
(Kazdin, 1987; Moffitt, 1993; Webster-Stratton 
& Taylor, 2001). 

When children with emotional and be¬ 
havioral disorders receive special education 
services to help them succeed at school, they 
may still have lower school attendance rates, 
grades, and graduation rates than students 
without disabilities or with any other disability 
classification (Blackorby, Chorost, Garza, & 
Guzman, 2005; Wagner et al., 2003; Wagner, 
Newman, Cameto, Garza, & Levine, 2005). In 
many cases, this scenario of low attendance, 
grades, and graduation rates leads to a poor 
transition to young adulthood and adverse life 
outcomes for youth with emotional and behav¬ 
ioral disorders (Wagner & Davis, 2006), in¬ 
cluding a 57% likelihood of being arrested 
within 2 years of leaving high school, employ¬ 
ment instability, and risk of entering the adult 
mental health treatment system (Kessler, Chiu, 
Demler, & Walters, 2005; Kutash, Duch- 
nowski, & Lynn, 2006; Newman, Wagner, 
Cameto, & Knokey, 2009; Wagner et al., 
2005). 

FIRST STEP TO SUCCESS: A MODEL 
PROGRAM TO PROMOTE POSITIVE 
BEHAVIOR 

First Step to Success (i.e.. First Step) is 
an evidence-based program (Sprague & Per¬ 
kins, 2009; Walker et al., 2009) for addressing 
children’s behavior problems at school and 
teaching positive, prosocial behavior. It is a 
secondary-level intervention (i.e., imple¬ 
mented when children do not respond to pri¬ 
mary, school-wide universal prevention strat¬ 
egies) for early elementary students who have 
moderate to severe behavior problems 
(Walker et al., 1997, 1998). Grounded in a 
social-ecological model (Bronfenbrenner, 
1979; Schalock, 1989), the fundamental prin¬ 
ciple behind First Step is that when peers and 
adults who are central to children’s experience 
learn and systematically apply strategies for 
eliciting and reinforcing positive behaviors, 
long-term improvements in children’s behav¬ 


ior at school and at home can result. First Step 
is expected to achieve behavioral improve¬ 
ments by applying three modular components 
in concert: (a) universal screening for behavior 
problems; (b) classroom intervention involv¬ 
ing the target student, peers, and teacher; and 
(c) in-home parent education designed to 
strengthen parenting skills and the home- 
school relationship. 

The First Step screening module in¬ 
volves teachers identifying students in their 
classrooms who are at risk of, or already ex¬ 
hibiting, internalizing or externalizing behav¬ 
ior problems and evaluating the students using 
standard measures of antisocial behavior. 
Screened children who exceed criteria and/or 
cutoff points are considered appropriate can¬ 
didates for First Step; one student per class¬ 
room can participate in the program at a time. 

The First Step classroom intervention, 
usually implemented over a period of 10 to 12 
weeks, involves a trained behavior coach 
working with the classroom teacher to learn 
and systematically apply strategies for elicit¬ 
ing and reinforcing the student’s positive be¬ 
haviors (e.g., cooperation, self-regulation). 
The coach implements the initial 5 program 
days, modeling for the classroom teacher and 
peers the techniques and strategies to elicit and 
support positive behavior. In the classroom, 
the coach provides feedback and monitors the 
student’s behavior using visual cues (i.e., a 
green-colored card to indicate on-task behav¬ 
ior and a red-colored card for inappropriate 
behavior) and tallies points for positive behav¬ 
ior during timed intervals. Each program day 
has performance criteria that must be met be¬ 
fore proceeding to the next program day. If the 
reward criteria are met, the student earns both 
a classroom and a home privilege that was 
prearranged with the teacher and parents (e.g., 
a brief free-time activity). If the criteria are not 
met, that program day is repeated or the stu¬ 
dent is “recycled” to an earlier program day 
before proceeding. 

After Program Day 5, the classroom 
teacher is responsible for monitoring and re¬ 
sponding to behavior according to First Step 
protocols, with daily coach supervision and 
support. Gradually, the interval in which a 
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student can earn points and praise is extended, 
until eventually, the target student must work 
in blocks of multiple days to earn a single 
reward of higher magnitude. Thus, the pro¬ 
gram becomes more demanding, requiring the 
student to sustain acceptable performance for 
longer and longer periods to be successful. 

The First Step home module, Home- 
Base, is designed to enable parents and care¬ 
givers to build child competencies and skills in 
areas that affect school adjustment and perfor¬ 
mance. During six 1-hr weekly HomeBase 
lessons, the behavior coach works with par¬ 
ents to enhance skills involving (a) communi¬ 
cating and sharing at school, (b) eliciting co¬ 
operation, (c) setting limits, (d) problem solv¬ 
ing, (e) encouraging friendships, and (f) 
building confidence. The HomeBase module 
includes lesson plans, instructional guidelines, 
and parent-child activities for directly teach¬ 
ing skills to children. The coach begins Home- 
Base after the participating child has com¬ 
pleted classroom Program Day 10 by visiting 
the parents ’ home or meeting them at another 
convenient location. At the end of each ses¬ 
sion, the coach leaves materials with parents 
to support their review and practice of each 
skill with their child daily for 10 to 15 min. 

FIRST STEP’S EVIDENCE BASE 

Development of the evidence base for 
First Step spans more than 2 decades and 
includes positive findings from single-subject 
designs conducted with students of diverse 
backgrounds (Beard & Sugai, 2004; Golly, 
Sprague, Walker, Beard, & Gorham, 2000) 
and multiple-group design studies conducted 
primarily in Oregon schools (Golly, Stiller, & 
Walker, 1998; Walker, Golly, McLane, & 
Kimmich, 2005; Walker et al„ 1998, 2014). 
First Step also has been studied in conjunction 
with other behavior strategies (Carter & 
Horner, 2007, 2009) and interventions in 
tiered systems of support (Nelson et al., 2009), 
and it has been adapted to preschool settings 
(Frey, Faith, Elliot, & Royer, 2006; Gunn, 
Feil, Seeley, Severson, & Walker, 2006). Ran¬ 
domized controlled studies have further shown 
positive results on students’ behavior and ac¬ 


ademic engagement at posttest (Walker et al., 
1998, 2009). Walker et al. (2009) reported 
results from a large-scale randomized con¬ 
trolled trial of First Step with 200 first- 
through third-graders with externalizing be¬ 
havior problems conducted in a diverse urban 
school district in the Southwest. Outcome data 
were collected from teacher and parent sur¬ 
veys, classroom observations, and direct stu¬ 
dent academic assessments at baseline and 
postintervention. The results, after variance in 
the baseline measures was controlled for, in¬ 
dicated that First Step students had signifi¬ 
cantly greater gains in symptom improvement, 
functioning, and academic competence, with 
effect sizes ranging from d = 0.44 to 0.87. 
Details of the study’s methodology, which 
mirror the methods of the current study, can be 
found in the article by Walker et al. (2009). 

STUDY PURPOSE 

The purpose of this study was to follow 
up previous efficacy research to examine sus¬ 
tained effects of First Step. The follow-up 
study addressed an increasingly salient ques¬ 
tion, as researchers and policymakers con¬ 
curred that finding positive effects of an inter¬ 
vention at its conclusion is not a sufficient 
basis for it to be taken to scale. In that vein, a 
committee of the Society for Prevention Re¬ 
search (SPR) established standards for identi¬ 
fying effective prevention programs that go 
beyond measuring effects immediately at 
postintervention to include showing effects in 
randomized controlled trials with long-term 
follow-up research. In fact, the SPR recom¬ 
mends “multiple follow-ups to examine the 
nature of the time-course of the program ef¬ 
fects” (Flay et al., 2005, p. 161). 

In line with these standards, this study 
used follow-up data from the identical sample 
of early elementary school participants from 
the First Step efficacy study reported earlier 
(Walker et al., 2009) to address two questions: 
(a) What effects on student behavior and aca¬ 
demics did First Step achieve and sustain 1 
year postintervention? (b) What were the dif¬ 
ferences in outcomes over time for First Step 
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students exposed to high versus low imple¬ 
mentation fidelity? 

METHOD 

Participants 

The study by Walker et al. (2009), from 
which we report follow-up results here, was 
the result of a collaboration between the First 
Step development team at Oregon Research 
Institute (ORI) and evaluators at SRI Interna¬ 
tional. ORI sought to assess the fidelity and 
impact of First Step within an ethnically and 
linguistically diverse school district. Thus, the 
team recruited 202 first- through third-grade 
general education teachers and 202 students 
and their parents from 34 public schools in a 
large Southwestern urban district that served 
more than 94,000 students. About 13% of 
those students received special education ser¬ 
vices, 57% were Hispanic, and 40% were el¬ 
igible for free or reduced-price lunches. 

Measures 

Outcomes 

Ten measures were used to address the 
outcome domains of prosocial/adaptive behav¬ 
ior, problem/maladaptive behavior, and aca¬ 
demic performance. ORI researchers collected 
outcome data for intervention and comparison 
students at baseline, immediately on comple¬ 
tion of First Step (posttest), as well as 1 year 
after First Step completion (follow-up). Be¬ 
cause students had advanced to a new grade by 
the time the follow-up measures were col¬ 
lected, a different teacher completed these as¬ 
sessments than had completed the pretest and 
posttest measures; however, in all cases, the 
same parent or guardian was asked to com¬ 
plete each of the parent-rated measures. 

Systematic Screening for Behavior 
Disorders 

In addition to serving as a baseline 
screening tool, the Systematic Screening for 
Behavior Disorders (SSBD) Adaptive Behav¬ 
ior Index (ABI; a = 0.82) and Maladaptive 
Behavior Index (MBI; a = 0.84) also mea¬ 
sured outcomes at posttest and follow-up. The 
SSBD has excellent psychometric characteris¬ 


tics, is nationally normed, and has been used 
in a number of research studies for both 
screening and outcome measures. For exam¬ 
ple, Lane, Menzies, Oakes, and Kalberg 
(2012) recently documented how the SSBD 
could be used to enhance and evaluate the 
impact of behavioral-academic interventions, 
and other studies found positive convergent 
validity between the SSBD and other outcome 
measures, including the Scale for Assessing 
Emotional Disturbance (Epstein & Cullinan, 
1998) and the Behavioral and Emotional Rat¬ 
ing Scale (Epstein & Sharma, 1998). Further¬ 
more, the SSBD indices have documented sta¬ 
tistically significant gains produced by First 
Step (Walker et al., 2009), showing effect 
sizes of 0.82 (ABI) and 0.87 (MBI). The 
strong psychometric properties and relative 
sensitivity of the SSBD indices justify their 
general use as outcome measures for short¬ 
term interventions with general education stu¬ 
dents in classroom settings. 

Academic Engaged Time 

Researchers who had been trained to 
high reliability (minimum interobserver agree¬ 
ment of 0.80) and were blind to group assign¬ 
ments directly observed students in interven¬ 
tion and comparison classrooms to collect 
measures of academic engaged time (AET), an 
indicator of students’ academic involvement 
and adjustment to classroom expectations 
(Walker & Severson, 1990). Procedures used 
to observe and score the sessions mirrored 
those manualized for SSBD Stage 3 (Walker 
& Severson, 1990), as reported by Walker et 
al. (2009). Using a stopwatch recording pro¬ 
cedure, observers documented the proportion 
of time during two 15-min intervals (collected 
on different days within 1 week of one an¬ 
other) in which students attended to teacher 
instructions and academic tasks, made rele¬ 
vant motor responses, and appropriately inter¬ 
acted with other teachers and peers. To mini¬ 
mize the effects of varying classroom contexts 
to the extent possible, observers collected 
AET data at the same time of day and during 
similar classroom activities across all data col¬ 
lection periods. Furthermore, across the waves 
of data collection, reliability estimates were 
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collected on 33% of the recorded AET obser¬ 
vations. Observers were retrained as necessary 
to minimize drift and ensure adequate reliabil¬ 
ity of recorded observations. The overall in¬ 
traclass correlation (ICC) of AET interrater 
reliability was excellent (ICC[3,1] = 0.80). 

Social Skills Rating System 

The Social Skills Rating System (SSRS) 
is a standardized, norm-referenced instrument 
with strong psychometric properties: content, 
construct, and concurrent validity has been re¬ 
searched extensively (Nangle, Hansen, Eardley, 
& Norton, 2009). Teachers completed three sub¬ 
scales of the SSRS-Teacher Version (Gresham 
& Elliott, 1990) that measured students’ social 
skills (SSRS-SS-T, a = 0.88), problem behav¬ 
iors (SSRS-PB-T, a = 0.85), and academic 
competence (SSRS-AC-T, a = 0.91; Gresham 
& Elliott, 1990). Parents completed the SSRS- 
Parent Version of the social skills subscale 
(SSRS-SS-P, a = 0.88) and problem behavior 
subscale (SSRS-PB-P, a = 0.88; Gresham & 
Elliott, 1990). 

Woodcock-J oh nson III Letter-Word 

Identification 

The Woodcock-Johnson III is a norm- 
referenced assessment with strong psychomet¬ 
ric properties (Cizek, 2001). To assess stu¬ 
dents’ abilities to identify isolated letters and 
words, researchers administered the Letter- 
Word Identification (LWI) subtest (a = 0.91) 
from the Woodcock-Johnson III Diagnostic 
Reading Battery (WJ LWI; Woodcock, 
Mather, & Schrank, 2004). Students’ standard 
scores, reported here, reflect each student’s 
level of performance compared with the gen¬ 
eral population of students of his or her same 
grade level and age. 

Oral Reading Fluency 

Researchers computed an oral reading 
fluency (ORF) score based on the average 
number of words read correctly by a student 
in 1 min from two different 300- to 400-word 
first-grade level reading passages previously 
used in national studies (Fuchs, 2003). 

Table 1 shows the mean and standard 
deviation of each outcome measure at base¬ 
line, posttest, and follow-up for intervention 


and comparison students. Figures 1 and 2 
show the mean values and standard errors for 
the behavior and academic outcome measures, 
respectively, at the three measurement times. 

Fidelity 

ORI researchers used the Implementa¬ 
tion Fidelity Checklist (IFC, a = 0.94; Walker 
et al., 2009) to document the extent to which 
the coach and teacher delivered First Step 
components as intended and with high quality. 
Researchers observed the implementer in the 
classroom three times: once for the coach on 
or around Program Day 5 and three times for 
the teacher on or around Program Days 10, 17, 
and 24. Observers rated the implementer on 18 
intervention components (e.g., whether the 
implementer announced the number of points 
needed for a reward, elicited cooperation from 
classroom peers, provided positive feedback, 
and used verbal reminders to prompt the stu¬ 
dent). For each intervention component, ob¬ 
servers rated implementation adherence {yes 
or no) and rated quality on a 5-point scale (1, 
very poor/not delivered ; 2, poor, 3, okay, 4, 
good; 5, excellent). The ICC assessing inter¬ 
rater reliability for 33% of the fidelity obser¬ 
vations was excellent (ICC[3,1] = 0.94). A 
classroom fidelity score was calculated as the 
average quality score across the 18 compo¬ 
nents and 4 observation periods. 

Procedures 

Teachers were randomly assigned to in¬ 
tervention or comparison groups within two 
cohorts (the 2005-2006 and 2006-2007 
school years). Using screening methods de¬ 
scribed later, teachers identified eligible stu¬ 
dents for study participation. Comparison- 
group teachers continued to use their typical 
instructional and classroom management tech¬ 
niques and to refer students to additional ser¬ 
vices as available and warranted, with no sup¬ 
port other than what the school typically of¬ 
fered. Results of group-equivalence analyses 
(Woodbridge et al., 2010) indicated that First 
Step students were not significantly different 
from comparison-group students on measures 
of demographic characteristics, school factors, 
or baseline behavioral or academic measures. 
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Figure 1. Behavior Outcomes by Assessment Time and Group (a-f) 




Baseline Posttest Follow-up 

AssessmenITime 



—Controls ♦ First Step 

■-„ 1 SEM Controls » t +t- 1SEM First Step 



i 

Postlest 

Assessment Time 



--*■— Controls 

♦ First Stsp 

_-h +f -1 SEM Controls 

--- 1 SEM First Step 


Abbreviations: ABI, Adaptive Behavior Index; MBI, Maladaptive Behavior Index; PB, Problem Behavior subscale; SS, 
Social Skills subscale; SSBD, Systematic Screening for Behavior Disorders; SSRS, Social Skills Rating System. 


ORI’s role in the trial involved training 
behavior coaches and teachers in the interven¬ 
tion protocol, monitoring and supporting im¬ 
plementation fidelity (e.g., by providing con¬ 
sultation to participating teachers), and col¬ 
lecting outcome data. Coaches were in close 
contact with ORI supervisory staff and were 
themselves scheduled to regularly undergo fi¬ 
delity monitoring checks to review their ad¬ 
herence to the First Step implementation 
protocol. 

We determined that there would be min¬ 
imal risk of contamination in this design from 


teachers in different groups exchanging First 
Step behavioral procedures or materials be¬ 
cause (a) First Step is a manualized interven¬ 
tion that requires a coach’s training and sup¬ 
port in the classroom for a prescribed period, 
(b) no intervention or comparison students 
were placed in the same classroom, and (c) 
teachers participating in First Step in the first 
cohort were not assigned to the comparison 
condition in the second cohort. Furthermore, 
through the duration of the study, the ORI 
team did not report any contaminants that dif¬ 
ferentially affected classrooms. 
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Figure 2. Academic Outcomes by Assessment Time and Group (a-d) 




Controls 

—•— First Step 

►--i ♦/-1 SEM Controls 

•- 1 ♦/-1 SEM First Step 



Baseline Posttest Follow-up 

Assessment Time 


Controls 

♦ First Step 

•-„ ♦/-1 SEM Controls 

■-. *f -1 SEM First Step 


Abbreviations: AC, Academic Competence subscale; AET, academic engaged time; ORF, oral reading fluency; SSRS, 
Social Skills Rating System; WJ-LWI, Woodcock-Johnson III Letter-Word Identification subtest. 


After becoming familiar with students for 
at least 30 days, teachers reviewed descriptions 
of externalizing behaviors (e.g., displaying ag¬ 
gression, arguing, disturbing others, fighting) 
and nominated five students in the class who 
exhibited the highest levels of such behaviors. 
For the three highest-ranked students, teachers 
completed brief ratings of students’ behaviors. 
Teachers in both intervention and comparison 
classrooms used an abbreviated version of the 
SSBD (Walker & Severson, 1990) composed of 
the (a) ABI, 12 items (e.g., follows classroom 
rules, cooperates with peers) rated on a 5-point 
Likert scale ranging from 1 (never) to 5 (fre¬ 
quently)', (b) MBI, 11 items (e.g., refuses to 
participate, creates a disturbance) rated on the 
same 5-point Likert scale; and (c) Critical Events 
Index (CEI), which indicated how many of 30 
high-saliency, low-frequency indicators (e.g., 
stealing, physical aggression) occurred during 
the preceding 30 days for each student. 

The student with the highest combined 
score on the ABI, MBI, and CEI was the first 


to be invited to participate in the study. If two 
students in a classroom had the same score, 
the student with the higher raw CEI score was 
selected first. If the parents of that student did 
not consent to their child’s participation in the 
study, consent was sought from the parents of 
the next highest-scoring student. Although the 
SSBD has an optional Stage 3 that involves 
classroom and playground observations, it was 
deemed too labor intensive for the purposes of 
this study. 

Statistical Analysis 
Data Imputation 

To avoid attrition and potential bias, we 
imputed missing values using fully conditional 
specification models (i.e., logistic, polyto- 
mous, or linear regression) applied iteratively 
via Stata’s imputation-by-chained equations 
procedure (Royston, 2004; Van Buren, Brands, 
Groothuis-Oudshoorn, & Rubin, 2006). Vari¬ 
ables used in the imputation-by-chained 
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equations included the baseline, posttest, and 
follow-up values for the 10 outcome mea¬ 
sures; student-specific variables included stu¬ 
dent age, grade level, gender, race or ethnicity, 
language, English language learner status, free 
and reduced-price lunch program status, and 
special education status; teacher-specific vari¬ 
ables included a summary measure of teacher 
self-reported knowledge and skills in working 
with students with behavior problems 
(Cheney, Walker, & Blum, 2004); and school- 
specific variables included a school mobility 
score, suspension rate, and summary measure 
of the presence of school-wide positive behav¬ 
ior support on assessment with the School¬ 
wide Evaluation Tool (Horner et al., 2004). 
These covariates were used in previous anal¬ 
yses of the main effects of First Step (Sumi et 
al., 2013); their influence on the imputed val¬ 
ues was small but not negligible. 

The percentage of data that were im¬ 
puted ranged from 4% to 10% for baseline, 
from 5% to 9% for posttest, and from 14% to 
26% for follow-up measures. Twenty imputa¬ 
tions were conducted separately for interven¬ 
tion- and comparison-group students for each 
missing value. These results were combined to 
provide estimates of the variability and p val¬ 
ues for regression coefficients. 

Analysis of Intervention Effects 

In the previous First Step efficacy study. 
Walker et al. (2009) applied multivariate anal¬ 
ysis of covariance with baseline values as co¬ 
variates to examine posttest results. Our anal¬ 
ysis used hierarchical linear modeling (HLM) 
to examine outcomes at all three observation 
intervals (baseline, posttest, and follow-up) in 
the same regression. This nested model ac¬ 
counted for uncertainty in the baseline mea¬ 
sures and also used centered fidelity checklist 
(IFC) values as covariates. We analyzed post¬ 
test results for reference only because these 
results were reported by Walker et al. (2009); 
however it should be noted that effect size 
estimates in the previous study were larger in 
absolute value than those in our study. Differ¬ 
ences in effect size values ranged from 0.03 on 
the SSRS-PB-T (effect size of —0.73 in the 
previous study versus —0.70 in this study) 


to 0.36 on the SSRS-AC-T (effect size of 0.66 
versus 0.30). 

We performed HLM regressions with 
repeated measures on each set of imputed data 
to estimate intervention effects at posttest and 
follow-up and to estimate the effects of imple¬ 
mentation fidelity. The model included inde¬ 
pendent terms for time, treatment, and fidelity 
and variance terms for school, student nested 
within school, and time nested within student. 
Each student was associated with baseline, 
posttest, and follow-up values for the depen¬ 
dent variables (either observed or imputed), as 
shown in the following model: 

fijk ~ B 0 + B ! 7' y - + B 2 {J\ + J 2 ) + B 3 Ty(J 1 + 7 2 ) 
+ B 4 J 2 + B 5 T if l 2 + B 6 T iJ (J l + J 2 ) 

F ij + B i T rt J 2 F ij + e j + %,• + e jjk (1) 

In this model, i is an index for student 
within School /; j is an index for school; k is an 
index for time, with 0 indicating baseline, 1 
indicating posttest, and 2 indicating follow-up; 
Y ijk is the value of the dependent variable for 
the /th student in the jth school at Time k; J k is 
an indicator for Time k\ T, : is an indicator for 

V 

Student i,j being in the intervention group; F t - 
is the centered IFC fidelity value for student 
(i,j), which was centered among intervention 
students and takes values of 0 for comparison 
students; e j represents residual variability be¬ 
tween schools; e t j represents residual variabil¬ 
ity between students within schools; e t j k rep¬ 
resents residual variability between times 
within students (also including measurement 
error); B 0 is the expected mean for comparison 
students at Time 0; B x is the difference be¬ 
tween intervention and comparison students at 
baseline; B 2 is the increase in outcome be¬ 
tween baseline and posttest for comparison 
students; B 3 is the effect of intervention on 
change in outcome from baseline to posttest; 
B 4 is the increase in outcome between posttest 
and follow-up for comparison students; B 5 is 
the effect of intervention on change in out¬ 
come from posttest to follow-up; B 6 is the 
effect of fidelity on intervention students from 
baseline to posttest; and B 7 is the effect of 
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fidelity on intervention students from posttest 
to follow-up. The terms with coefficients B 6 
and B 7 were only used in analyses to examine 
the effect of fidelity. 

A simplified version of the full HLM 
model was used to examine the statistical sig¬ 
nificance of changes in one of the two groups 
over time, fit using only comparison-group 
data or only intervention-group data: 

Y ijk = B o + + h) + b aJ 2 + e j + e ,j + e ,j k 

( 2 ) 

Intervention effects were examined for 
multiple outcomes in three domains: (a) proso¬ 
cial/adaptive behavior (ABI, SSRS-SS-T, 
SSRS-SS-P); (b) problem/maladaptive behav¬ 
ior (SSBD-MBI, SSRS-PB-T, SSRS-PB-P); 
and (c) academic (SSRS-AC-T, AET, WJ 
LWI, ORF). Because we performed multiple 
tests for intervention effects, we applied the 
Benjamini-Hochberg correction for Type I er¬ 
ror rate (Schochet, 2008) within each domain 
(i.e., for a given test, the reported p value was 
the smallest false discovery rate value for 
which the corresponding null hypothesis was 
rejected). Effect sizes are reported using a 
statistic analogous to Cohen’s (1988) d, cal¬ 
culated by dividing the treatment indicator 
coefficient by an estimate of the pooled be- 
tween-student standard deviation at posttest or 
follow-up. 

RESULTS 

Research Question 1: Effects Achieved 
at Posttest and Sustained at Follow-Up 

Posttest Effects 

Table 2 summarizes the treatment coef¬ 
ficients at posttest and follow-up using the 
statistical model described earlier, excluding 
the fidelity variables. As shown in the Base¬ 
line to Posttest columns, First Step students 
achieved significantly greater improvements 
than comparison students on all behavior mea¬ 
sures, with positive treatment coefficients for 
adaptive behavior and negative treatment co¬ 
efficients for maladaptive behavior measures 
(significant effects in a beneficial direction for 


the intervention group are indicated by an up 
arrow). First Step students also achieved pos¬ 
itive and significant effects for SSRS-AC-T, 
both before and after adjustment for multiple 
tests, and for AET before adjustment. 

Sustained Effects 

As shown in the Posttest to Follow-Up 
columns of Table 2, First Step posttest effects 
were not sustained through the follow-up mea¬ 
surement point. In fact, First Step students 
differed significantly from comparison stu¬ 
dents in the undesired direction on five of the 
six behavior measures (the exception being 
SSRS-PB-P), with negative treatment coeffi¬ 
cients for adaptive behavior and positive treat¬ 
ment coefficients for maladaptive behavior 
measures. Treatment effects also were nega¬ 
tive and statistically significant for the aca¬ 
demic measures of SSRS-AC-T and AET (sig¬ 
nificant effects in an adverse direction for the 
intervention group are indicated by a down 
arrow). As shown in the Baseline to Follow-Up 
columns, the overall change from baseline to 
follow-up for SSRS-PB-P was the only statis¬ 
tically significant effect after adjustment for 
multiple comparisons. 

To interpret the scoring pattern that re¬ 
sulted in these findings, we present the param¬ 
eters of the HLM regression that correspond to 
the change from baseline to posttest, posttest 
to follow-up, and baseline to follow-up for 
both groups and their associated standard er¬ 
rors in Table 3. Withdrawal of intervention 
was associated with First Step students’ scores 
deteriorating significantly on five measures of 
behavior from posttest to follow-up—decreas¬ 
ing on the SSBD-ABI (-5.7 points,/? < .001), 
SSRS-SS-T (-8.9 points, p < .001), and 
SSRS-SS-P (—4.4 points, p < .01) and in¬ 
creasing on the SSBD-MBI (3.7 points, p < 
.01) and SSRS-PB-T (4.5 points,/? < .01). In 
contrast, the comparison group generally 
maintained behaviors at posttest levels. In the 
academic domain, the comparison and First 
Step groups had similar point decreases on the 
WJ LWI (—2.9 and —3.3, respectively; p < 
.001) and similar increases on the ORF (22.0 
and 22.1, respectively; p < .001). On the 
AET, First Step students showed an increase 
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Table 2. HLM Results and Effect Sizes for First Step at Postintervention Time Points 
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System; T, teacher; WJ III LWI, Woodcock-Johnson III Letter-Word Identification subtest. 
a p Value after applying Benjamini-Hochberg correction for multiple comparisons. 










Table 3. HLM Results for Separate Comparison and Intervention Analyses 
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Abbreviation: ABI, Adaptive Behavior Index; AC, Academic Competence subscale; AET, academic engaged time; HLM, hierarchical linear modeling; MBI, Maladaptive Behavior 
Index; ORF, oral reading fluency; P, parent; PB, Problem Behavior subscale; SS, Social Skills subscale; SSBD, Systematic Screening for Behavior Disorders; SSRS, Social Skills Rating 
System; T, teacher; WJ III LWI, Woodcock-Johnson III Letter-Word Identification subtest. 
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of 8.1 points (p < .001) but the comparison 
group showed a larger increase (17.0 points, 
p < .001). These changes in academic mea¬ 
sures eliminated the significant differences in 
SSRS-AC-T and AET between groups that 
were present at posttest. 

We also examined the ICC at the school 
level because the extent to which outcomes 
varied by school has potential implications for 
understanding the generalizability of the inter¬ 
vention. Across all outcome measures and time 
points, the proportion of variance attributable to 
school ranged between 0.18 and 0.31, with com¬ 
parable average ICCs at baseline (0.25), posttest 
(0.25), and follow-up (0.23). 

Research Question 2: Relationships 
Between Fidelity and Student Outcomes 

The mean classroom fidelity score was 
calculated as 3.73 (i.e., between okay and 
good on the 5-point scale), with an SD of 0.53 
and a range of 2.23 to 4.86. Although overall 
fidelity ratings were within an acceptable 
range, we assessed whether First Step students 
whose intervention was implemented with 
higher fidelity (i.e., adherence and quality) 
achieved better outcomes than students whose 
intervention was delivered with lower fidelity. 

Table 4 shows the relationship of imple¬ 
mentation fidelity to changes in student out¬ 
comes from baseline to posttest and from post¬ 
test to follow-up. The only statistically signif¬ 
icant effect among the 30 relationships tested 
was a negative relationship between imple¬ 
mentation fidelity and students’ academic en¬ 
gagement from posttest to follow-up (a 
1-standard deviation increase in fidelity low¬ 
ered students’ academic engagement by al¬ 
most 4 standard deviations, p = .016). 

DISCUSSION 

Effects of First Step at Follow-Up 

The criteria for establishing the effec¬ 
tiveness of an intervention, as set forth by the 
SPR, include a rigorous demonstration of ef¬ 
fects on a representative sample on outcomes 
measured immediately and at least 6 months 
after the intervention. Previous research, in¬ 


cluding multiple randomized trials, showed 
First Step’s positive effects on behavior and 
academic outcomes at posttest. However, the 
analyses of follow-up data from a recent effi¬ 
cacy trial reported here show that First Step 
did not meet the criterion for duration of ef¬ 
fect. That is, the significant positive effects of 
First Step at the conclusion of the prescribed 
intervention were no longer evident 1 year 
after intervention on measures of student be¬ 
havior or most academic outcomes. 

The closing of the gaps in behavioral 
outcomes resulted primarily from First Step 
students’ scores declining with the withdrawal 
of the intervention and transition to a new 
teacher/grade level, whereas comparison stu¬ 
dents’ scores remained fairly stable. In con¬ 
trast, no clear pattern is evident for academic 
measures. On the measure of academic com¬ 
petence, the intervention group’s loss and the 
comparison group’s gain from posttest to fol¬ 
low-up eliminated the gap that initially fa¬ 
vored First Step students at posttest. The com¬ 
parison group’s growth in academic engage¬ 
ment also outstripped the posttest advantage of 
First Step students. On the WJ LWI, the com¬ 
parison group showed posttest growth but both 
groups declined by follow-up to levels equal 
to or below baseline. However, similar in¬ 
creases in ORF maintained the parity of the 
two groups that was apparent at posttest 
through to follow-up. 

Findings based on parent perspectives 
are especially intriguing because, unlike 
teachers, the respondents are consistent across 
periods. On these measures of behavior, par¬ 
ents of comparison-group students indicated 
their children showed generally steady (but 
nonsignificant) improvements in problem be¬ 
havior and social skills from baseline to fol¬ 
low-up. In contrast, parents of First Step stu¬ 
dents reported significant behavioral improve¬ 
ments immediately after intervention, including 
a reduction in problem behaviors at follow-up, 
but ratings indicated an erosion of gains in social 
skills from posttest to follow-up. 

The lack of sustainment of First Step’s 
effects (and that of some other short-term in¬ 
terventions) should be considered in the theo¬ 
retical context of the intervention. One theory 
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x p Value after applying Benjamini-Hochberg correction for multiple comparisons. 
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might posit that the mechanisms of change 
(e.g., change in teacher behavior) will produce 
change in child behavior, which is then inter¬ 
nalized by the target child and thus sustained 
through naturally reinforcing contingencies. 
Unfortunately, long-term findings clearly do 
not support this prevention theory. Rather, 
First Step teachers learned effective strategies 
for eliciting and reinforcing the student’s pos¬ 
itive behaviors within their current classroom 
environments, but the program did not show 
prevention outcomes across years in this 
study. Students’ participation in 30 program 
days may not have been sufficient to produce 
the theoretical chain of events resulting in the 
sustainability of intervention effects within 
new environments where First Step was not 
implemented. 

Two phenomena may explain both the 
absence of maintained gains in First Step stu¬ 
dents’ behavior and the greater improvement 
in comparison students’ behavior. First, the 
pattern might be explained by the change in 
the behavioral environment of students’ class¬ 
rooms in the year after intervention. That is, 
by posttest, First Step teachers and classroom 
peers had learned how to elicit positive behav¬ 
ior from the target students, behavior that was 
reflected in improvements in students’ behav¬ 
ioral assessments. However, the following 
year’s teachers and classroom peers had no 
such training. In the absence of the reinforce¬ 
ment for appropriate behavior that First Step 
students had come to expect, their behavior 
may have reverted to prior patterns of poor 
behavior. 

Second, posttest-to-follow-up improve¬ 
ments in comparison students’ behavior may 
reflect different teacher and peer perceptions 
and expectations. Disruptive students can suf¬ 
fer reputational bias, where teachers and peers 
pejoratively judge the student’s behavior even 
after it undergoes improvements in the short 
term (Hollinger, 1987; Maag, Vasa, Kramer, 
& Torrey, 1991). The students in the study 
were initially selected by the implementation- 
year teachers as those with the most severe 
externalizing behavior problems in the class¬ 
room. As students matriculated from the im¬ 
plementation to the follow-up year, teachers 


and students at those subsequent grade levels 
may have had different perceptions and expec¬ 
tations for behavior than those in the previous 
year’s lower grades (i.e., a different composi¬ 
tion of students in a class may influence a 
teacher’s perspective on severe behavior). Bi¬ 
ases may have caused a negative reputation to 
persist over time despite any beneficial re¬ 
sponse the First Step students initially showed. 
Generalization and maintenance of social skill 
interventions requires embedding the target 
students in peer social systems and classroom 
structures that continue to support the learned 
behaviors (Farmer, Van Acker, Pearl, & Rod- 
kin, 1999; Maag, 2006). 

Each of these contextual factors may 
account for some degree of variance in the 
deterioration of effects in First Step outcomes 
across school years, and cumulatively, these 
phenomena can be quite powerful. As a se¬ 
lected intervention program. First Step might 
produce sustained effects if implemented in 
the context of an effective Tier 1, universal 
prevention program, but this hypothesis war¬ 
rants further investigation. 

There also are lessons to be learned from 
other psychosocial programs that have shown 
impressive long-term outcomes. Programs 
with long-term effects share particular charac¬ 
teristics that short-term applied interventions 
like First Step do not. Namely, they (a) span 
many years at full dosage with sustained im¬ 
plementation and (b) show prevention and in¬ 
tervention effects on distal outcome variables 
(e.g., drug and alcohol use, school failure and 
dropout, arrests, assignment to specialized ser¬ 
vices, health risks such as disease and preg¬ 
nancy rates) rather than proximal outcomes 
(e.g., behavioral observations, teacher and par¬ 
ent ratings; Borduin et ah, 1995; Conduct 
Problems Prevention Research Group, 2011; 
Eddy, Reid, & Fetrow, 2000; Hawkins, Cata¬ 
lano, Kosterman, Abbott, & Hill, 1999). 

Relationships Between Fidelity and 
Effects at Follow-Up 

Analyses reported here indicate that stu¬ 
dents who experienced First Step at higher 
fidelity also had significantly greater erosion 
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from posttest to follow-up in intervention ben¬ 
efits on a single measure (AET) than students 
who experienced First Step implemented with 
lower fidelity. Although a single significant 
result may not indicate an overarching trend, 
the AET findings (based on external direct 
observations of student behavior, not partici¬ 
pant perspectives) may be worthy of further 
investigation and discussion. A possible ex¬ 
planation for this pattern is that students who 
experienced First Step at high fidelity also 
experienced the greatest contrast between the 
intervention classroom and the more typical 
classroom they entered the following year. 
The potentially significant disruption from one 
year to the next in what First Step students had 
come to expect in the way of teachers’ re¬ 
sponses to positive and negative behavior and 
classroom management practices might have 
prompted students to revert to prior patterns of 
poor academic engagement. Barkley (2007) 
attributed this phenomenon to interventionists 
designing altered environments (i.e., behav¬ 
ioral methods and contingency management 
systems) to reduce behavior symptomatology 
that cannot be sustained with natural contin¬ 
gencies alone once the intervention was with¬ 
drawn. Among students who experienced First 
Step at lower fidelity of implementation, less 
contrast between the implementation-year and 
second-year classrooms might have prompted 
relatively less decline in behavior. 

Future Directions 

As shown by posttest results in this and 
previous studies. First Step procedures effec¬ 
tively teach students to regard their teachers’ 
and peers’ guidance and reinforcement as dis¬ 
criminative stimuli that elicit appropriate be¬ 
havior; however, these intervention effects 
erode after the stimuli are withdrawn. Barkley 
(2007) compared behavioral interventions to 
medication management programs, which 
could produce benefits as long as they were in 
place but would not maintain or generalize 
once ceased. In fact, Barkley (2007) referred 
to the expectation of postextinction “bursts” 
(p. 280) of heightened problematic behavior 
on the withdrawal of positive reinforcement 


for their previous occurrences. The data from 
this study suggest the importance of working 
with the teachers and classmates of prior First 
Step students in the year after implementation 
to reduce the probability of worsening behav¬ 
ior problems and support the enduring effects 
of the intervention. 

Limitations 

Although the findings presented here 
were generated from a study that meets high 
standards for methodologic rigor, there are 
some limitations. For one, most behavioral 
outcome measures used in the study (other 
than the AET) were second-party reports of 
students’ behavior, not measured through di¬ 
rect observation. Training and implementation 
involved in First Step may have affected the 
perceptions and, thus, the ratings of student 
behavior by participating teachers; still, the 
behavior scales used in the study have very 
strong psychometric qualities. 

Furthermore, the absence of additional 
measurement intervals (e.g., 3 or 6 months 
after implementation, at the beginning of the 
school year after implementation) limits the 
ability to gauge the precise duration of First 
Step’s posttest effects. If effects began to de¬ 
teriorate quickly after posttest, encouraging 
and enabling the intervention teacher to restate 
expectations and reinforce positive behaviors 
could prolong effects. However, if effects 
were sustained while First Step students re¬ 
mained in their intervention classroom, atten¬ 
tion to sustaining gains must shift to subse¬ 
quent-year teachers. Future evaluations of 
First Step should include enough posttest mea¬ 
surement points to depict the trajectory in 
postintervention outcomes. 

CONCLUSION 

The findings presented here are the lat¬ 
est additions to 2 decades of research and 
development behind First Step. Developers’ 
efforts continue, with the intent to strengthen 
and broaden the intervention, to develop fur¬ 
ther the evidence of its effectiveness, and to 
establish its readiness for dissemination to di¬ 
verse populations of students. The First Step 
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story underscores the reality that establishing 
and disseminating evidence-based practices at 
scale take a sustained, iterative effort over a 
considerable period on the part of a research 
community committed to finding effective, 
scalable solutions to important problems. 
Those efforts cannot be sustained without 
funding sources that share that commitment 
and understand the importance of continuity in 
research programs. 
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